-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Conversation
As an integration plan, I'd suggest adding these runtime APIs alongside the existing ones, postfixing all the old ones with |
Yields the validator-set at the state of a given block. This validator set is always the one responsible for backing parachains in the child of the provided block. | ||
|
||
```rust | ||
fn validators() -> Vec<ValidatorId>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn validators() -> Vec<ValidatorId>; | |
fn validators(parent_hash) -> Option<Vec<ValidatorId>>; |
Without the parent hash, we're not providing a context from which to get a validator set; the runtime can't know whether we're requesting the current set, the set for the next block, or a historical set.
The Option
is meant to handle the case that a validator set is unavailable: an invalid or unknown parent_hash
, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, by that logic we should make all of these Option
al. I would rather bake in an assumption that the parent hash is valid as that's closer to what the underlying code provides. Invalid parent-hashes just return runtime API errors.
impl GroupRotationInfo { | ||
/// Returns the index of the group needed to validate the core at the given index, | ||
/// assuming the given amount of cores/groups. | ||
fn group_for_core(core_index: usize, cores: usize) -> usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn group_for_core(core_index: usize, cores: usize) -> usize; | |
fn group_for_core(core_index: ValidatorIndex, core: usize) -> usize; |
If we have a ValidatorIndex
newtype, we may as well use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// If this core is freed by being timed-out, this is the assignment that is next up on this | ||
/// core. None if there is nothing queued for this core or there is no possibility of timing | ||
/// out. | ||
next_up_on_time_out: Option<ScheduledCore>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having two next_up_on
fields feels like it introduces the potential for confusion. When would we want next_up_on_time_out
to differ from next_up_on_available
? If they do differ, is it possible for a core both to timeout and to become available simultaneously? If so, what's the actual next scheduled core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the runtime has all the answers".
We don't want them to differ, and yes, it is confusing, but this is purely exposing information about the system described in the scheduler module. It would be nice to have a clear "next up" in all cases, but deeper reading on the scheduler module will reveal why that is not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible for a core both to timeout and to become available simultaneously
No, they are mutually exclusive. These are the two different paths an Occupied
core can take to the Free
state. However, depending on which path is taken (which is unpredictable as it's based on the view/honesty of the block producer), the scheduling metadata can be different, leading to a different assignment onto the now-Free
core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we should always optimize for the next_up_on_available
case in Node-side code. The reason being that timeouts are only possible at a small subset of blocks, as they are only triggered within the short span of time directly following a group rotation. And even when timeouts can be triggered, unless validators are offline, they will not be reached before availability. And if validators are offline, it's fine to degrade throughput of paras somewhat.
However this runtime API is in the interest of making all information available to the node so more advanced strategies can be taken as research evolves.
@@ -7,6 +7,14 @@ In a way, this entire guide is about these candidates: how they are scheduled, c | |||
|
|||
This section will describe the base candidate type, its components, and variants that contain extra data. | |||
|
|||
## Para Id | |||
|
|||
A unique 32-bit identifier referring to a specific para (chain or thread). The relay-chain runtime guarantees that `ParaId`s are unique for the duration of any session, but recycling and reuse over a longer period of time is permitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter for the purpose of this PR, but I'm curious: are the wrapped u32
s hashes of some kind, or handed out sequentially, or what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't described the registrar in the guide yet, but I think it gives them out sequentially. Parachain IDs start at 0
and parathread IDs start with some of the higher bits set, although I don't remember exactly what.
Reuse is fine as long as it's at least a few sessions apart, although it would be best not to reuse until having cycled completely. We could alternatively use a generation/index system like this for handing out IDs to avoid reuse over a long period of time: http://bitsquid.blogspot.com/2014/08/building-data-oriented-entity-system.html
The uniqueness property is what's important here and the details of the registrar are free to change
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I have a question about error handling though. None of the proposed runtime APIs return a Result
. Question: at which layer a storage access error is handled and how or is it always infallible?
/// Returns the validator groups and rotation info localized based on the block whose state | ||
/// this is invoked on. Note that `now` in the `GroupRotationInfo` should be the successor of | ||
/// the number of the block. | ||
fn validator_groups(at: Block) -> (Vec<Vec<ValidatorIndex>>, GroupRotationInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have a type alias for a validator group as well
type ValidatorGroup = Vec<ValidatorIndex>;
/// Fetch the value of the runtime API at the block. | ||
/// | ||
/// Definitionally, the `at` parameter cannot be any block that is not in the chain. | ||
/// Thus the return value is unconditional. However, for in-practice implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ordian Does this address your Result
question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partially, I should have been more explicit in my question, this was more substrate related question than to this PR, namely how low-level db errors are handled. And from what I see, https://github.com/paritytech/substrate/pull/3997/files, either the errors will be swallowed or a panic will occur. Anyway, this is handled on a different level.
Will merge this as it is directionally correct and address follow-ons in #1411 |
No description provided.